df_od <- read_csv(here::here("data", "VSRR_Provisional_Drug_Overdose_Death_Counts.csv"))
df_od$Date <- parse_date_time(paste(tolower(df_od$Month), df_od$Year), orders=c("by"))
df_od$NumDeaths <- df_od$`Data Value`
df_od <- df_od %>% filter(`Indicator`=='Number of Deaths' & `State` != 'US')
df_od <- df_od %>% left_join(., pop, by=c("State"="State"))
df_od$FatalitiesPerCap <- df_od$`Data Value`/df_od$pop*100
df_od <- df_od[order(df_od$FatalitiesPerCap),]

p1 <- ggplot(df_od, aes(x=Date, y=`FatalitiesPerCap`, group=`StateName`, colour=`StateName`)) +
  geom_line() +
  gghighlight(max_highlight = 8L, max(`FatalitiesPerCap`),  use_direct_label=FALSE, label_key=`FatalitiesPerCap`) +
  labs(title="WEST VIRGINIA'S OPIOID PROBLEM STANDS \nIN STARK CONTRAST TO REST OF COUNTRY", 
       subtitle = "Drug Overdoses per Capita Over the Years",
       caption="SOURCE: NATIONAL CENTER FOR HEALTH STATISTICS",
       x="DATE", y="% FATAL DRUG OVERDOSES PER CAPITA") +
  guides(color=guide_legend(title="STATE"))+
  jtheme + jcolor

p1

Arguably no one has been hit harder by the opioid epidemic than those living in the rust belt and the rural south. One can see above that rural America (Alabama, Arkansas, Kentucky, Tennesee, and West Virginia) comprise some of the states hit hardest per capita by opioid related deaths. Many factors contribute to this scourge on the region, including economic disenfranchisement and high rates of disability enrollment.

df_ods_by_state <- aggregate(df_od, by=list(df_od$StateName), FUN=mean, na.rm=TRUE)
df_ods_by_state$StateName <- df_ods_by_state$Group.1
df_ods <- df_ods_by_state %>% select(StateName, FatalitiesPerCap)
mean_ods <- mean(df_ods$FatalitiesPerCap)

df_ods$Above <- df_ods$FatalitiesPerCap > mean_ods

df_ods$Above[df_ods$Above == TRUE] <- "ABOVE AVERAGE"
df_ods$Above[df_ods$Above == FALSE] <- "BELOW AVERAGE"


p2 <- ggplot(df_ods, aes(x=reorder(StateName, FatalitiesPerCap), y=FatalitiesPerCap)) + 
  geom_point(stat='identity', aes(col=Above, label=format(FatalitiesPerCap, digits=2, nsmall=2)), size=6)  +
  scale_color_manual(name="FATALITIES BY OVERDOSE (% OF POP)", 
                     labels = c("BELOW AVERAGE", "ABOVE AVERAGE"), values=jpalette) +
  labs(title="NEW YORK AND WEST VIRGINIA LIE ON OPPOSITE \nPOLES OF THE OPIOID EPIDEMIC", 
       subtitle="Fatal Drug Overdoses by State",
       caption="SOURCE: NATIONAL CENTER FOR HEALTH STATISTICS",
       x="STATE", y="FATALITIES PER CAPITA (% OF POP)") +
  ylim(0.4, 1.4) +
  jtheme + jfill + jcolor + 
  theme(
     panel.grid.major.x = element_blank()
  ) + coord_flip() 
  
p2

While West Virginia is most intensely affected among US states, no state is immune to the epidemic; even in New York, the state whose opioid-related death rate is lowest per capita, one half of one percent of the population will die from a drug-related overdose. The standard deviation of opioid abuse’s impact on states is smaller than one might guess if they are only reading articles (https://www.nytimes.com/interactive/2018/us/west-virginia-opioids.html) about the plight of West Virginia. While West Virginia is an outlier as shown in the above plot, this is a nationwide epidemic.

DRUG_NAMES <- c("VICODIN", "OXYCODONE", "OXYCONTIN", "PERCOCET", 
                "OPANA", "KADIAN", "AVINZA", "FENTANYL")

fname_drug_utilization <- "State_Drug_Utilization_Data_"

# First time processing of DataFrame
#df_util <- read_csv(here::here("data", paste(fname_drug_utilization, "2008", ".csv", sep=""))) %>% filter(`Product Name` %in% DRUG_NAMES)
#for (i in 2009:2018) {
#  df_util <- rbind(df_util, read_csv(here::here("data", paste(fname_drug_utilization, i, ".csv", sep=""))) %>% filter(`Product Name` %in% DRUG_NAMES)) 
#}

#readr::write_csv(df_util, here::here("data", "df_util.csv"))
df_util <- read_csv(here::here("data", "df_util.csv"))
df_rx <- aggregate(df_util[,c("Number of Prescriptions", "Total Amount Reimbursed")], by=list(df_util$Year, df_util$`Product Name`), FUN="sum", na.rm=TRUE)
df_rx$`Drug Name` <- df_rx$Group.2
df_rxImportant <- df_rx %>% filter(`Drug Name` %in% c("OXYCODONE", "FENTANYL",  "KADIAN", "OXYCONTIN"))

p3 <-  df_rxImportant %>% ggplot(aes(Group.1, `Total Amount Reimbursed`, size=`Number of Prescriptions`, colour=`Drug Name`)) +
  geom_point(alpha = 0.7) +
  geom_line(size=1) + 
  scale_y_continuous(labels = dollar) +
  labs(title = "PAINKILLER REIMBURSEMENTS SHRINK WHILE \nPRESCRIPTIONS REMAIN FAIRLY CONSTANT",
       subtitle = "Expense Reimbursement & Prescriptions Over Time\n\n",
       caption = "SOURCE: DATA.MEDICAID.GOV",
       x = 'YEAR', y = 'TOTAL $ REIMBURSED') +
  jtheme + jfill + jcolor

p3$labels$size <- "NUMBER OF PRESCRIPTIONS"
p3$labels$colour <- "DRUG NAME"

p3

2015 was a clear reckoning for the US Government’s reimbursement of presciption opioids, those contributing most towards the opioid epidemic. The data above shows the prescriptions in cicle size over time, along with the total amount reimbursed, thanks to medicaid data. Economists might anticipate that as less and less reimbusement money is available for prescription opioids, doctors would prescribe fewer pills. Though we do see slightly larger circles as reimbursements go up, it seems that prescriptions aren’t very responsive to drug reimbursement. This relationship may show different results in comparing reimbursement to the number of filled prescriptions at the pharmacy.

df_util$StateName <- abbr2state(df_util$State)

# Drug Overdoses
DATA_OVERDOSE <- "VSRR_Provisional_Drug_Overdose_Death_Counts.csv"
df_od <- read_csv(here::here("data", DATA_OVERDOSE))
df_od$Date <- parse_date_time(paste(tolower(df_od$Month), df_od$Year), orders=c("by"))
df_od$NumDeaths <- df_od$`Data Value`
df_od <- df_od %>% filter(`Indicator`=='Number of Deaths' & `State` != 'US')
df_od <- df_od %>% left_join(., pop, by=c("State"="State")) %>% left_join(., df_util, by=c("State"="State", "Year"="Year"))

df_od$FatalitiesPer <- df_od$`Data Value`*1000/df_od$pop
df_od$`Number of Prescriptions` <- df_od$`Number of Prescriptions`*1000/df_od$pop

yr=2018

df_ods_by_state <- df_od %>% filter(Year == yr)
df_ods_by_state <- aggregate(list(FatalitiesPer=df_ods_by_state$FatalitiesPer, NumRxPer=df_ods_by_state$`Number of Prescriptions`), by=list(StateName=df_ods_by_state$StateName.x), FUN="sum", na.rm=TRUE)

# Import Cartogram Polygons, merge drug overdose data
spdf <- geojson_read(here::here("data", "us_states_hexgrid.geojson"),  what = "sp")
spdf@data = spdf@data %>% mutate(google_name = gsub(" \\(United States\\)", "", google_name)) 
spdf@data = spdf@data %>% left_join(., df_ods_by_state, by=c("google_name"="StateName"))
cartogram <- cartogram_cont(spdf, 'FatalitiesPer')

carto_fortified <- tidy(cartogram, region = "google_name") %>% left_join(. , cartogram@data, by=c("id"="google_name")) 
centers <- cbind.data.frame(data.frame(gCentroid(cartogram, byid=TRUE), id=cartogram@data$iso3166_2))   

p_green <- ggplot() +
    geom_polygon(data = carto_fortified, aes(fill = FatalitiesPer, x = long, y = lat, group = group) , size=0.05, alpha=0.9, color="#222222") +
    scale_fill_gradient(low=BLUE, high=RED, name="OPIOID RELATED DEATHS", guide=guide_legend( keyheight = unit(3, units = "mm"), keywidth=unit(15, units = "mm"), title.position = 'top', label.position = "bottom") ) +
    geom_text(family="Quick", data=centers, aes(x=x, y=y, label=id), color="#222222", size=3, alpha=0.6) +
    labs( title=paste("WHERE OVERDOSE DEATHS HIT HARDEST: ",yr),
          subtitle="Cartogram of Opioid Related Deaths Across the US",
          caption="SOURCE: VSSR, DATA.MEDICAID.GOV") + 
    coord_map() +
    jtheme_nogrid +
    theme(
        plot.background = element_rect(fill=bg_color, color = NA),
        panel.background =  element_rect(fill=bg_color, color = NA),
        strip.background =  element_rect(fill=bg_color, color = NA),
        strip.background.x =  element_rect(fill=bg_color, color = NA),
        strip.background.y =  element_rect(fill=bg_color, color = NA),
        legend.background = element_rect(fill=bg_color, color=NA),
        legend.box.background = element_rect(fill=bg_color, color=NA),
        plot.margin = margin(53,0,53,0,"pt")
    ) 
p_blue <- ggplot() +
    geom_polygon(data = carto_fortified, 
                 aes(fill = NumRxPer, x = long, y = lat, group = group) , 
                 size=0.05, alpha=0.9, color="#222222") +
    scale_fill_gradient(low=BLUE, high=RED, name="PRESCRIPTIONS PER 100K", 
                         guide=guide_legend( keyheight = unit(3, units = "mm"), keywidth=unit(15, units = "mm"), title.position = 'top', label.position = "bottom") ) +
    geom_text(family="Quick", data=centers, aes(x=x, y=y, label=id), color="#222222", size=3, alpha=0.6) +
    labs( title=paste("WHERE DRUGS LEAVE THE HOSPITAL: ",yr),
          subtitle="Cartogram of Key Opioid Prescription Rates Across US the US",
          caption="SOURCE: VSSR, DATA.MEDICAID.GOV") +
    coord_map()  +
    jtheme_nogrid +
    theme(
        plot.background = element_rect(fill=bg_color, color = NA),
        panel.background =  element_rect(fill=bg_color, color = NA),
        strip.background =  element_rect(fill=bg_color, color = NA),
        strip.background.x =  element_rect(fill=bg_color, color = NA),
        strip.background.y =  element_rect(fill=bg_color, color = NA),
        legend.background = element_rect(fill=bg_color, color=NA),
        legend.box.background = element_rect(fill=bg_color, color=NA),
        plot.margin = margin(46,0,47,0,"pt")
    ) 



#grid.arrange(p_green, p_blue, nrow = 2)
p_green

The rust belt in 2018 saw the worst of opioid related drug overdoses. This likely has to do with smuggling routes, with major hubs in cities like Chicago. The rural west appears to be relatively minimally affected, with states along the Sierra Nevadas showing low rates of overdose deaths. In the Northeast, results are mixed, with coastal mid-atlantic states faring much better than the rural border states like New Hampshire.

p_blue

Correlation to states with high prescription rates doesn’t tell much of a story when comparing cartograms. Maryland had by far the most opioid prescriptions, followed by a cluster of states in the Southwest, omitting Utah, whose large Mormon population dampens its statistics related to drug utilization. There doesn’t appear to be a relationship within given years between opioid prescription and abuse within states. This could hint at a black market, or that most people are not getting addicted to opioids from injury.

US <- map_data("world") %>% filter(region=="USA")
data=us.cities

df_util$`NUMBER OF PRESCRIPTIONS` = df_util$`Number of Prescriptions`
df_util$`PRODUCT NAME` = df_util$`Product Name`
df_util %>% filter(df_util$`Product Name` %in% c("OXYCODONE", "OXYCONTIN")) %>%
  ggplot() +
    geom_polygon(data = US, aes(x=long, y = lat, group = group), fill="grey", alpha=0.3) +
    geom_point(width=2, height=2, aes(x=Longitude, y=Latitude, size=`NUMBER OF PRESCRIPTIONS`, color=`PRODUCT NAME`), shape=20, stroke=FALSE) +
    scale_size_continuous(name="NUMBER OF PRESCRIPTIONS", range=c(1,12)) +
    xlim(-125, -70) + ylim(25,50) + coord_map() + 
    guides(color=guide_legend(title="PRODUCT NAME")) +
    jtheme_nogrid + jcolor +
    labs(title = "OXY'S MOVEMENT SHOWS A LULL\nBETWEEN 2012-2014", subtitle='PRESCRIPTIONS THROUGH THE YEARS: {frame_time}', caption="SOURCE: MEDICAID.GOV") +
  transition_time(Year) +
    ease_aes('linear', interval=1) 

Shown above are the proliferation of prescriptions of two of the most dangerously addictive opioids, Oxycodone and Oxycontin, by their site of prescription over the past ten years from 2008 to 2018. Circle sizes show relative number of prescriptions for each location overlayed on the US map. Alaska and Hawaii were omitted from this graphic, and were very low in total number of prescriptions. Interestingly, there is a noticeable lull between the years of 2012 and 2014, where fewer prescriptions are doled out. Additionally, one might expect these to follow population segments, however, while there is a dense corridor of prescriptions on the east coast, the west coast does not display similar levels of opioid prescription rates at any point during the sampled time period.

df_fda <- read_tsv(here::here("data", "fda", "Submissions.txt"))
df_sub_class <- read_tsv(here::here("data", "fda", "SubmissionClass_Lookup.txt"))
df_fda <- df_fda %>% left_join(., df_sub_class, by=c("SubmissionClassCodeID"="SubmissionClassCodeID"))

df_fda <- df_fda %>% filter(!(SubmissionClassCodeDescription %in% c("Labeling", "Not Applicable", "Supplement", "Efficacy", "REMS", "Medical Gas", NA)))
df_fda$Year <- format(as.Date(df_fda$SubmissionStatusDate, format="%Y-%m-%d"),"%Y")
df <- df_fda %>% select(Year, "SubmissionClassCode")
df$Type <- case_when(
  df$`SubmissionClassCode`  == "TYPE 1" ~ "NEW MOLECULAR ENTITY",
  df$`SubmissionClassCode`  == "TYPE 2" ~ "NEW ACTIVE INGREDIENT",
  df$`SubmissionClassCode`  == "TYPE 3" ~ "NEW DOSAGE",
  df$`SubmissionClassCode`  == "TYPE 4" ~ "NEW COMBINATION",
  df$`SubmissionClassCode`  == "TYPE 5" ~ "NEW MANUFACTURER",
  TRUE ~ "OTHER"
)
df$ones <- 1

df7 <- aggregate(df$ones, by=list("Year"=strtoi(df$Year), "Submission Type"=df$Type), FUN="sum", na.rm=TRUE)



glimpse(df7)
## Observations: 419
## Variables: 3
## $ Year              <int> 1950, 1951, 1952, 1953, 1955, 1956, 1957, 19...
## $ `Submission Type` <chr> "NEW ACTIVE INGREDIENT", "NEW ACTIVE INGREDI...
## $ x                 <dbl> 2, 2, 2, 4, 5, 4, 4, 7, 2, 2, 3, 2, 1, 5, 1,...
df7 %>% ggplot(aes(x=`Submission Type`, y=x, fill=`Submission Type`, color=`Submission Type`)) +
    geom_bar(stat='identity') +
    labs(title = "FDA DRUG SUBMISSION TYPES", subtitle='SUBMISSION TRENDS THROUGH THE YEARS: {frame_time}', y="TOTAL", caption="SOURCE: FDA.GOV")+
  guides(colour=guide_legend(nrow=30)) + jtheme + jfill + ylim(0,100) + 
  theme(axis.text.x = element_text(angle=90, hjust=1)) +
  transition_states(year, transition_length = 2, state_length=3) +
  transition_time(Year) +
  ease_aes('linear', interval=1)

While over the years, various types of drug submissions have fluctuated, the early 2000s showed major spikes in “Bioequivalent” drug submission type, new drugs to perform similar functions as existing drugs. Other submission types, for context are “New Molecular Entities” (Type 1), “New Active Ingredient” (Type 2), “New Dosage Form” (Type 3), “New Combination” (Type 4), “New Formulation/Manufacturer” (Type 5), etc. Note that in the 70s and 80s, New Manufacturers spiked. In the 2000s, Bioequivalence, New Dosage Forms, and New Combinations spiked most. This appears to show a shift from acquisitions of drug manufacturers into rebranding of existing drugs, and pivots towards different dosages. Further research could shed light on the nature of Type 3 dosage changes, and if the FDA has approved dosage increases at an extraordinary rate.

wv_shape <- st_read(here::here("data", "tl_2018_54_cousub.shp"))
## Reading layer `tl_2018_54_cousub' from data source `C:\Users\justincohler\workspaces\opioid-epidemic-study\data\tl_2018_54_cousub.shp' using driver `ESRI Shapefile'
## Simple feature collection with 230 features and 18 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -82.64459 ymin: 37.20154 xmax: -77.71952 ymax: 40.6388
## epsg (SRID):    4269
## proj4string:    +proj=longlat +datum=NAD83 +no_defs
wv_shape$FIPS <- strtoi(paste(wv_shape$STATEFP, wv_shape$COUNTYFP, sep=''))

wv_df <- read_csv(here::here("data", "west_virginia_county_data.csv"))

wv <- wv_shape %>% 
  left_join(wv_df, by = c("FIPS" = "FIPS"))
wv$`OD DEATH RATE` <- wv$`Drug Overdose Mortality Rate`

ggplot(data = wv) +
  geom_sf(aes(fill=`OD DEATH RATE`, alpha=`% Unemployed`, label=ifelse(FIPS > 54108 & FIPS < 54110, "Wyoming County", NA)), size=0.05) + 
  labs(title="WEST VIRGINIA'S CONCENTRATED AREAS OF\nUNEMPLOYMENT ARE HIT HARDEST BY DRUG ABUSE",
       caption="SOURCE: WEST VIRGINIA DEPT. OF HHR") +
  annotate("text", x = -80.2, y = 40.3, family="Consolas", color=text_color, hjust = 0, label = "2018's unemployment rate\nand number of deaths\nby overdose are\nhighly correlated") +
  #geom_text(aes(label = ifelse(FIPS==54109, "Wyoming County", "")), stat="identity", vjust=-1)+
  #geom_label(data=subset(wv, FIPS==54109), aes(label="Wyoming County")) +
  jtheme_nogrid + jfillc + jalpha +
  guides(
   color = guide_colorbar(title="OD DEATH RATE"),
   alpha = guide_legend(title="% UNEMPLOYED\n(OPACITY)")
 ) +
  theme(
    plot.title = element_text(),
    plot.margin=grid::unit(c(0,0,0,0), "mm")
  )

Digging into the state where the opioid epidemic has run most rampant, the data clearly show a harrowing case in the southwest of West Virginia. Wyoming County has the highest rate of drug overdoses resulting in death, and appears as the brightest pink in the southwest corridor. The more opaque counties represent a greater percentage of those unemployed. In the graphic above, as the percent unemployment exceeds double digits, the opacity appears as solid. Unsurprisingly, the more solid colors align with those counties where the OD rates soar, showing how all encompassing the opioid epidemic is; among the data there is a feedback loop of strongly correlated adverse outcomes (% unemployment, low levels of education, etc.) with high rates of opioid abuse.

ggplot(data = wv) +
  geom_sf(aes(fill=`% Food Insecure`, alpha=`Food Environment Index`), size=0.05) + 
  labs(title="SOUTHWEST WEST VIRGINIA, WHERE MOST OPIOID ABUSE OCCURS,\n EXPERIENCES CRIPPLING FOOD INSECURITY",
       caption="SOURCE: WEST VIRGINIA DEPT. OF HHR") +
  annotate("text", x = -80.4, y = 40.3, family="Consolas", color=text_color, hjust = 0, label = "2018's Food Insecurity \nPercentages, colored along\nwith food availability\nindex encoded with opacity\nshows familiar hotspots") +
  jtheme_nogrid + jfillc2 + jalpha +
  guides(
   color = guide_colorbar(title="% Food Insecure"),
   alpha = guide_legend(title="Food Environment Index\n(OPACITY)")
 ) +
  theme(
    plot.title = element_text(),
    plot.margin=grid::unit(c(0,0,0,0), "mm")
  )

In exploring food insecurity, the majority of those who are food insecure reside in counties where opioid abuse is prevalant. In the above graphic, we now see the more opaque the county, the better their Food Environment Index is, a Bureau of Health and Human Services index of grocery options. Additionally, we see the percentage of those deemed highly food insecure appear more blue. Wyoming County once more is among the most food insecure of all counties in Wyoming. The county is overwhelmingly white, at 97.3%, and unremarkable in its graduation rate and english proficiency among Wyoming Counties.